Microsoft Excel has the ability to use data from many sources. Other than the us
ID: 3669145 • Letter: M
Question
Microsoft Excel has the ability to use data from many sources. Other than the use of copy/paste, which is usually limited to relatively small amounts of data, and often not formatted very well once imported, users use the data import feature, using delimited text files. What is a delimiter, and what are some of the advantages and disadvantages of using characters, such as commas, tabs, vertical bar |, and so on?
Microsoft Excel has the ability to use data from many sources. Other than the use of copy/paste, which is usually limited to relatively small amounts of data, and often not formatted very well once imported, users use the data import feature, using delimited text files. What is a delimiter, and what are some of the advantages and disadvantages of using characters, such as commas, tabs, vertical bar |, and so on?
Explanation / Answer
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.
A delimiter is one or more characters that separates text strings. Common delimiters are commas (,), semicolon (;),quotes ( ", ' ), braces ({}), pipes (|), or slashes ( / ). When a program stores lots of data it may use a delimiter to separate each of the data values. For example, "john|doe" has a pipe as its delimiter, a program or script could distinguish between the first and last name in a string of text.
Delimiters can also be used to separate the data items in a database (the columns in the database table) when transporting the database to another application. For example, a comma-separated values file(CSV file) is one in which each value in the cells of a table row is delimited by and separated from the next value by a comma. The beginning of a row is indicated by a new line character.
Due to their widespread use, comma- and tab-delimited text files can be opened by several kinds of applications, including most spreadsheet programs and statistical packages, sometimes even without the user designating which delimiter has been used.[5][6]
Typically a delimited file format is indicated by a specification. Some specifications provide conventions for avoiding delimiter collision, others do not. Delimiter collision is a problem that occurs when a character that is intended as part of the data gets interpreted as a delimiter instead. Comma- and space-separated formats often suffer from this problem, since in many contexts those characters are legitimate parts of a data field.
Most such files avoid delimiter collision either by surrounding all data fields in double quotes, or only quoting those data fields that contain the delimiter character. One problem with tab-delimited text files is that tabs are difficult to distinguish from spaces; therefore, there are sometimes problems with the files being corrupted when people try to edit them by hand. Another set of problems occur due to errors in the file structure, usually during import of file into a database (in the example above, such error may be a pupil's first name missing).
Depending on the data itself, it may be beneficial to use non-standard characters such as the tilde (~) as delimiters. With rising prevalence of web sites and other applications that store snippets of code in databases, simply using a " which occurs in every hyperlink and image source tag simply isn't sufficient to avoid this type of collision. Since colons (:), semi-colons (;), pipes (|), and many other characters are also used, it can be quite challenging to find a character that isn't being used elsewhere.
The main disadvantage of using paired delimiters for comments is that it results in diminished reliability. It is easy to inadvertently leave off the final delimiter, which extends the comment to the end of the next comment, effectively removing code from the program. The advantage of paired delimiters is that you can comment out areas of a program. The disadvantage of using only beginning delimiters is that they must be repeated on every line of a block of comments. This can be tedious and therefore errorprone. The advantage is that you cannot make the mistake of forgetting the closing delimiter.