Go Back   Computer Forums > General Computing > Programming
Click Here to Login
Join Computer forums Today


Reply
 
Thread Tools Search this Thread Display Modes
 
Old 07-31-2004, 05:14 PM   #1
Baseband Member
 
Join Date: Jul 2004
Posts: 23
Default Is it possible to decode Unicode symbols (In *.DAT files) from a UTF-8 format?

I have looked through certain .DAT files on my computer and I noticed alot of nonsense characters (like control characters, numbers, and a few intelligible alphabet characters) and some even show up as blocks (open rectangles). I assume that there used to be code there in its place and then what I see is the result of encryption... right?

I've been looking up on certain UTF formats (I think windows 98 runs off of UTF-8, right?) and its limitations when it tries to process Unicode characters outside of its character mapping range.

For instance a Hebrew or Chinese character might be &#6382, and since its too big a number to fit into the ASCII/UTF-8 (what's the difference?) charaster set, it breaks down the number into several readable peices (which can number up to 4) and they appear as (4) individual (ASCII readable) characters instead of the single chinese character. So "1 Chinese charcater" (might) = (4 charcaters) A7. A7 doesn't make sense on it's own, but with a decoder (I assume) it might combine these 4 charcters and recognize that it had been previously out of range of the original character set and translate it into the chinese character set again.


That's what I'm guessing anyway. What I DON'T know is where to find a decoder that might make sense out of "A7" and turn it into its original character.

I'm not suggesting though that Windows programmers programmed their .DAT files in chinese characters though. But why else would the .DAT file hold all those blocks and strings of nonsense? What is really behind those characters? It it really just control characters setting perimiters for the file or is it an encryption of something else? Do I even need a decoder to find out what was behind that string of nonsense characters?

I'd appreciate any speculation on why .DAT files look like they do. It's bugging me to death. Thnx.
__________________

cybershark5886 is offline   Reply With Quote
Old 07-31-2004, 07:29 PM   #2
Guru
 
Lord Kalthorn's Avatar
 
Join Date: Dec 2003
Location: Britain
Posts: 13,293
Send a message via MSN to Lord Kalthorn
Default

It'll be Source Code in a .dat file. You'll need a program which can open the .dat file and get to its innerds. It is impossible to decode this and view it in Notepad as I imagine you are doing - as it doesn't suppose it.

Visual Studio? or another Programming Program will do the trick probably.
__________________

__________________
A Knight is sworn to Honour. His heart knows only Virtue. His blade defends the helpless. His might upholds the Weak. His word speaks only truth. His wrath undoes the Wicked.
Lord Kalthorn is offline   Reply With Quote
Old 08-01-2004, 07:20 AM   #3
Site Team
 
root's Avatar
 
Join Date: Mar 2004
Posts: 7,999
Default Re: Is it possible to decode Unicode symbols from a UTF-8 format?

visual studio is good.
but another good one (that is also free and already installed on your computer) is the edit program.
Use the dos prompt and type edit.
This program has a much larger character set than word. (which I think is limited to [space] -> ~)

thats 0x20 -> 0x7f hex.
MSdos has character support for all 256 chars of extended ascii.
root is offline   Reply With Quote
Old 08-01-2004, 12:46 PM   #4
Baseband Member
 
Join Date: Jul 2004
Posts: 23
Default

Thnx for your help guys! Visual Studio? I have that... i think. If its the expensive set of disks that has VB, Visual C++, Java and other programming languages then, yeah I have it. Which one of thoe programs would I need to "decode" it? VB? Or was windows more writen in C? And once I'm IN the program what then to open the file to read the source code?


Use the dos prompt and type edit.
This program has a much larger character set than word. (which I think is limited to [space] -> ~)

thats 0x20 -> 0x7f hex.
MSdos has character support for all 256 chars of extended ascii.



Really? I've known about the edit command forever, but I never knew that it supported more characters. Why would it? It's a DOS program! Logically shouldn't Microsoft have made OS's that came later to have MORE character set support? If not then I find that extremely funny because I know have another insult to throw at microsoft.
cybershark5886 is offline   Reply With Quote
Old 08-01-2004, 12:59 PM   #5
Baseband Member
 
Join Date: Jul 2004
Posts: 23
Default

Just for the record I'm not one of these "amatuers" who just happened to learn a little about computers and now want to jump into the deep stuff. Computers are my Hobby, and I've all but exauhsted myself by taking every class that my highschool offers on computers. I've read 1,000 page books on computer hardware and 1,000 page books on computer software. I plan to be a Computer Programmer or a Software Engineer (Yes there is difference). I'm not asking these questions so that I can 'crack'. I'm wanting to know this stuff because I want to 'hack' - in the traditional sense, in which 'hackers' were people who loved exploiting Source Code to better their knowledge of how something works.

I know better than to 'crack'. In fact I find it immoral.

But anyways, I really want to know how I can view this Source Code because seeing a jumbled mess in a dat file that is controling big programs, (which obviously make sense of it) that I can't read, bugs the stew out of me. If I'm going to be programmer I plan on being a 'Competent' programmer. So I'd appreciate any further help on this. Thnx.


P.S. Oh, and one more thing. Some .DLL's also have this jumbled mess of encoded figures also. Could its source code also be ascertained by Visual Studio, like the .Dat file, with the proper program? Because I sure as heck know that you can MAKE DLL's with VB (and I'm sure C++ also).
cybershark5886 is offline   Reply With Quote
Old 08-02-2004, 06:25 AM   #6
Site Team
 
root's Avatar
 
Join Date: Mar 2004
Posts: 7,999
Default Re: Is it possible to decode Unicode symbols from a UTF-8 format?

you can dissasemble programs using visual studio.net by just opening the files.
a DLL file will have been compiled and so you'll obviously need a dissasenbler.
You can open the file you want with the text editor that will be with visual C (assuming that you have Visual Studio 6) as visual C had the best text editor.
DOS has support for all the charactors because when it was made people used to have to write programs in native code.
Notepad is a limited text editor and only has support for charators that you can actually type using the keyboard. -is that stupid or a step backward? you decide.
Another program you may be interested in is turbo pad. it's an opensource project on sourceforge, but it's probably one of the best text editors around. -with good chatactor support as well.
root is offline   Reply With Quote
Old 08-02-2004, 05:15 PM   #7
Baseband Member
 
Join Date: Jul 2004
Posts: 23
Default

Thanks man. I appreciate it. And yes, I do have Visual studio 6. Thnx again.
__________________

cybershark5886 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off



All times are GMT -5. The time now is 07:53 PM.


Powered by vBulletin® Version 3.8.8 Beta 4
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Search Engine Friendly URLs by vBSEO 3.6.0