SITE SEARCH

Work with text. How to determine the encoding of a file

Let's find out what the file encoding is. In simple terms, the encoding is a set of byte characters that corresponds to the alphabetic alphabet in a particular language. For each language, there is a specific sequence of such coding signs. Sometimes there is a need to determine the encoding. Consider this for an example of a text document.

What do you need

A set of certain software tools. To begin with, enough applications of the type Word, KWrite, Firefox browser and recognition tools - enca.

You can determine the encoding of the file using the universal Microsoft Word editor. Before, it needs to be installed from the package Office. When the application is installed and can be opened using the icon in the form of a W character on the desktop, go to the next step.

The next stage of recognition

Through the navigation bar of the application, open the "File" - "Open" items one at a time. The same can be done by using a keyboard combination Ctrl + O.

Then in the dialog box, select the desired directory and, in fact, the file for reading. Select it with the mouse, click the "open" button.

When a file does not have a match set CP1251, the application tries to determineencoding. A list of possible matches will be displayed. In the proposed character sets on the right side of the list, select one of the encodings. If the choice is made correctly, the recognized text will be displayed in the "sample" element.

How to determine the encoding with KWrite

In addition to the preprocessor for word processing, Word, there are other functional utilities. One of them - KWrite (an analog for unix-systems). So that you are not confused, I will write down the points "to determine the encoding of the document in KWrite".

  1. Uploading a file with the extension .txt to the application.
  2. Recursion of encodings until one of them is not suitable.
  3. To perform step 2, go to the tools option in the encoding menu.

Browser Mozilla Firefox, the goal is the same - to determine the encoding

The principle is approximately the same as in utilities for working with text. Run the installed browser for execution, and if it is not installed - download the installer from mozilla.org.

Then in the open program window you need to opentext document via the "File" menu, the submenu "Open file". If the selected file is displayed without distortion, and the text is readable, it is not difficult to determine the encoding.

To do this, go to "View" - "encoding", there are displayed several sets of characters, and one of them, opposite which there is a "tick", and there is a browser-defined encoding.

If the text is not recognized correctly, select the subsection "additionally", experiment in it with encodings or select the value "auto".

Specialized software - working with enca

There is also a number of auxiliary electronic tools that make it possible to determine the encoding of unformatted text.

For those who are used to working under unix,utility enca. It can be installed using the "Package Manager" service. Having found the available category of packages, you can start installing the software.

To list the recognition languages, execute the enca -list languages ​​command using the terminal.

If you want to define the encoding of a text file after the key (g), enter its name, and after the (L) option, in about the same way, enter the recognition language:

enca -L russian -g /home/vic/temp/myfile.txt.

To sum up what was said about the encoding

I believe that the above utilities will provide the user with a sufficient set of tools for decoding text documents.

While, actually, this is all about how to recognizeencoding. For standard purposes, I think, the specified software quite will approach. There are more specialized methods of definition, but their consideration is beyond the scope of this article.

For Microsoft Word, the source of recognition can be either plain text or a document with complex formatting.

</ p>
  • Rating: