Please enable / Bitte aktiviere JavaScript!
Veuillez activer / Por favor activa el Javascript![ ? ]
How to automatically identify text in wrong encoding? - bug-code.com Q&A

How to automatically identify text in wrong encoding?

0 like 0 dislike
21 views
There is a database which writes data to third-party program, the task is to take data from it for reporting. In General, all written and working except for one awkward moment, the table is periodically saved to the text in the wrong encoding, ie it looks something like a Microsoft PowerPoint Presentation repair that
R—R°RRR° RRR±SS SR°СЃС...РѕРґРЅРеРєРе.docx
and treated in the usual transcoding of 1251 to utf.
The fact of how to automatically determine what the text is stored incorrectly but to check it the presence of symbols °"? Maybe there's another, more competent method?
asked by | 21 views

2 Answers

0 like 0 dislike
answered by
0 like 0 dislike
For example, it is possible to peep implementation autodetect encodings in far manager. Or Google similar. Typically the statistically typical character codes - start to read the file to a more or less unambiguous statistics and guess the encoding. far specifies the encoding quite successfully in most cases.

well, or when there are some hints of the type begins the file with the Russian text - it can be stupid to count the number of falls in the list of Russian letters to characters in several versions of transcode)
answered by
Welcome to Bug-code Q&A, where you can ask questions and receive answers from other members of the community.

24.8k questions

46.2k answers

0 comments

12.7k users

24,788 questions
46,220 answers
0 comments
12,722 users