Is it possible to decode Unicode symbols from a UTF-8 format?

cybershark5886

Baseband Member
Messages
24
Is it possible to decode Unicode symbols (In *.DAT files) from a UTF-8 format?

I have looked through certain .DAT files on my computer and I noticed alot of nonsense characters (like control characters, numbers, and a few intelligible alphabet characters) and some even show up as blocks (open rectangles). I assume that there used to be code there in its place and then what I see is the result of encryption... right?

I've been looking up on certain UTF formats (I think windows 98 runs off of UTF-8, right?) and its limitations when it tries to process Unicode characters outside of its character mapping range.

For instance a Hebrew or Chinese character might be &#6382, and since its too big a number to fit into the ASCII/UTF-8 (what's the difference?) charaster set, it breaks down the number into several readable peices (which can number up to 4) and they appear as (4) individual (ASCII readable) characters instead of the single chinese character. So "1 Chinese charcater" (might) = (4 charcaters) A7£æ. A7£æ doesn't make sense on it's own, but with a decoder (I assume) it might combine these 4 charcters and recognize that it had been previously out of range of the original character set and translate it into the chinese character set again.


That's what I'm guessing anyway. What I DON'T know is where to find a decoder that might make sense out of "A7£æ" and turn it into its original character.

I'm not suggesting though that Windows programmers programmed their .DAT files in chinese characters though. ;) But why else would the .DAT file hold all those blocks and strings of nonsense? What is really behind those characters? It it really just control characters setting perimiters for the file or is it an encryption of something else? Do I even need a decoder to find out what was behind that string of nonsense characters?

I'd appreciate any speculation on why .DAT files look like they do. It's bugging me to death. Thnx. :)
 
It'll be Source Code in a .dat file. You'll need a program which can open the .dat file and get to its innerds. It is impossible to decode this and view it in Notepad as I imagine you are doing - as it doesn't suppose it.

Visual Studio? or another Programming Program will do the trick probably.
 
visual studio is good.
but another good one (that is also free and already installed on your computer) is the edit program.
Use the dos prompt and type edit.
This program has a much larger character set than word. (which I think is limited to [space] -> ~)

thats 0x20 -> 0x7f hex.
MSdos has character support for all 256 chars of extended ascii.
 
Thnx for your help guys! Visual Studio? I have that... i think. If its the expensive set of disks that has VB, Visual C++, Java and other programming languages then, yeah I have it. Which one of thoe programs would I need to "decode" it? VB? Or was windows more writen in C? And once I'm IN the program what then to open the file to read the source code?


Use the dos prompt and type edit.
This program has a much larger character set than word. (which I think is limited to [space] -> ~)

thats 0x20 -> 0x7f hex.
MSdos has character support for all 256 chars of extended ascii.



Really? I've known about the edit command forever, but I never knew that it supported more characters. Why would it? It's a DOS program! Logically shouldn't Microsoft have made OS's that came later to have MORE character set support? If not then I find that extremely funny because I know have another insult to throw at microsoft. :p
 
Just for the record I'm not one of these "amatuers" who just happened to learn a little about computers and now want to jump into the deep stuff. Computers are my Hobby, and I've all but exauhsted myself by taking every class that my highschool offers on computers. I've read 1,000 page books on computer hardware and 1,000 page books on computer software. I plan to be a Computer Programmer or a Software Engineer (Yes there is difference). I'm not asking these questions so that I can 'crack'. I'm wanting to know this stuff because I want to 'hack' - in the traditional sense, in which 'hackers' were people who loved exploiting Source Code to better their knowledge of how something works.

I know better than to 'crack'. In fact I find it immoral.

But anyways, I really want to know how I can view this Source Code because seeing a jumbled mess in a dat file that is controling big programs, (which obviously make sense of it) that I can't read, bugs the stew out of me. If I'm going to be programmer I plan on being a 'Competent' programmer. So I'd appreciate any further help on this. Thnx. :)


P.S. Oh, and one more thing. Some .DLL's also have this jumbled mess of encoded figures also. Could its source code also be ascertained by Visual Studio, like the .Dat file, with the proper program? Because I sure as heck know that you can MAKE DLL's with VB (and I'm sure C++ also).
 
you can dissasemble programs using visual studio.net by just opening the files.
a DLL file will have been compiled and so you'll obviously need a dissasenbler.
You can open the file you want with the text editor that will be with visual C (assuming that you have Visual Studio 6) as visual C had the best text editor.
DOS has support for all the charactors because when it was made people used to have to write programs in native code.
Notepad is a limited text editor and only has support for charators that you can actually type using the keyboard. -is that stupid or a step backward? you decide.
Another program you may be interested in is turbo pad. it's an opensource project on sourceforge, but it's probably one of the best text editors around. -with good chatactor support as well.
 
Back
Top Bottom