Проверка csv файла на ошибки

Check whether the CSV file is valid or not, this tool also give hint about where the location of the error for easier debugging.

CSV Validator & Linter

What is CSV ?

Comma-separated values (CSV) file is a delimited text file that uses a comma or other delimiter to separate values.

This tool support custom delimiter
This tool also give information about the error if there are any for debugging purpose (linting)

Sample of Valid CSV Text :

cat, dog, horse
house, car, cycle

About

Toolkit Bay or TKB is an online tools website providing free and easy to use tools to increase productivity.

If you have any inquiries or suggestions or issues, you can contact us on:

contact@toolkitbay.com

Data & Privacy

We respect your data. Uploaded file/data/input will be automatically deleted. And the processed data will be deleted less than a day.

More detail on privacy here

Copyright © 2021 Toolkit Bay. All Rights Reserved

CI
GitHub license
NPM package version

CSV validator

A CLI tool to validate a CSV file against a set of rules defined with JSON Schema. It’s heavily inspired from csval, with some additions to match my needs.

Usage

Usage: csv-validator [options] <csvFile> <rulesFile>

Validate a CSV file against a set of rules defined with JSON Schema.

Options:
  -V, --version              output the version number
  -a, --abort-early          move to the next line as soon as an error is encountered
  -d, --dynamic-typing       convert data into the appropriate type according to their format
  -e, --encoding <encoding>  specify the encoding of the files (default: "utf8")
  -q, --quiet                hide the list of errors encountered
  -s, --skip-empty-lines     ignore empty lines in the CSV file
  -h, --help                 display help for command

Rules files

The rules files must follow the JSON Schema specification.

Basic example

Consider the following rules file.

{
  "type": "object",
  "properties": {
    "brand": {
      "type": "string"
    },
    "model": {
      "type": "string"
    },
    "color": {
      "type": "string"
    },
    "mileage": {
      "type": "integer",
      "minimum": 0
    },
    "sold": {
      "type": "boolean"
    }
  }
}

Valid file

The following CSV file will pass validation.

brand,model,color,mileage,sold
Toyota,Prius,blue,45108,true
Opel,Zafira,red,2784,false
Nissan,Micra,green,98410,false
File "cars.csv" passes validation checks.

Inconsistent number of fields

The following CSV file will fail validation because the number of fields is inconsistent across rows.

brand,model,color,mileage,sold
Toyota,Prius,blue,45108,true
Opel,red,2784,false
Nissan,Micra,green,98410,false
Error at row 2: Too few fields: expected 4 fields but parsed 3.
File "cars.csv" fails validation checks.

Schema violation

The following CSV file will fail validation because some data violates the rules.

brand,model,color,mileage,sold
Toyota,Prius,blue,-14,true
Opel,Zafira,red,2784,false
Nissan,Micra,green,98410,false
Error at row 1: Field "mileage" must be greater than or equal to 0.
File "cars.csv" fails validation checks.

Advanced examples

Nullable types

Consider the following rules file.

{
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "race": {
      "type": "string"
    },
    "chip_number": {
      "oneOf": [
        { "type": "null" },
        { "type": "integer" }
      ]
    }
  }
}

The following CSV file should pass validation, because data seems to match the rules.

name,race,chip_number
Bounty,Golden Retriever,
Ritsuka,Akita,1784647826

But in practice it will fail.

Error at row 1: Field "chip_number" does not match any of the allowed types.
File "dogs.csv" fails validation checks.

This is because the CSV format does not have a standard method to represent null fields. Therefore empty fields are converted to empty string, not null values. To allow the use of the «null» type in the rule files, it is possible to use the -d or --dynamic-typing option, which will convert the data into the type that seems best suited to their representation.

The file then passes validation.

File "dogs.csv" passes validation checks.

Multiple errors

When validating large datasets, the number of error messages may quickly become unmanageable. It is then possible to use several options to limit the number of errors displayed.

Consider the following rules file and CSV file.

{
  "type": "object",
  "properties": {
    "first_name": {
      "type": "string"
    },
    "last_name": {
      "type": "string"
    },
    "age": {
      "type": "integer",
      "minimum": 0
    }
  }
}
first_name,last_name,age
Stefania,Josh,23
Damiano,,unknown
Paulie,Niese,78
,Vasyutkin,
Marlena,,68
Caressa,Hanington,
Glenine,,72
Ilyssa,Kelling,48
Syd,,unk
Babara,Killcross,59

Without any options, the validation will fail with the following errors.

Error at row 2: Field "last_name" is not allowed to be empty.
Error at row 2: Field "age" must be a number.
Error at row 4: Field "first_name" is not allowed to be empty.
Error at row 4: Field "age" must be a number.
Error at row 5: Field "last_name" is not allowed to be empty.
Error at row 6: Field "age" must be a number.
Error at row 7: Field "last_name" is not allowed to be empty.
Error at row 9: Field "last_name" is not allowed to be empty.
Error at row 9: Field "age" must be a number.
File "people.csv" fails validation checks.

As you can see, some lines contain several errors. Let’s assume now that you just want to know which lines contain errors. You can use the -a or --abort-early option, and the output will then look like this.

Error at row 2: Field "last_name" is not allowed to be empty.
Error at row 4: Field "first_name" is not allowed to be empty.
Error at row 5: Field "last_name" is not allowed to be empty.
Error at row 6: Field "age" must be a number.
Error at row 7: Field "last_name" is not allowed to be empty.
Error at row 9: Field "last_name" is not allowed to be empty.
File "people.csv" fails validation checks.

And if you just want to know whether a file is valid or not, you can use the -q or --quiet option. The output will then look like this.

File "people.csv" fails validation checks.

Empty lines

Now imagine that your CSV file is perfectly valid, but contains an additional empty line at the end. By default, validation will fail with a message like this.

Error at row 11: Too few fields: expected 3 fields but parsed 1.
File "people.csv" fails validation checks.

The -s or --skip-empty-lines option allows you to ignore empty lines within the file. The file is then again considered valid.

File "people.csv" passes validation checks.

Are there any good sites/services to validate consistency of CSV file ?

The same as W3C validator but for CSV ?

Thanks!

asked Jul 18, 2011 at 20:27

Scherbius.com's user avatar

Scherbius.comScherbius.com

3,4064 gold badges24 silver badges44 bronze badges

2

The Open Data Institute is developing a CSV validation service that will allow users to check the structure of their data as well as validate it against a simple schema.

The service is still very much in alpha but can be found here:

http://csvlint.io/

The code for the application and the underlying library are both open source:

https://github.com/theodi/csvlint

https://github.com/theodi/csvlint.rb

The README in the library provides a summary of the errors and warnings that can be generated. The following types of error can be reported:

  • :wrong_content_type — content type is not text/csv
  • :ragged_rows — row has a different number of columns (than the first row in the file)
  • :blank_rows — completely empty row, e.g. blank line or a line where all column values are empty
  • :invalid_encoding — encoding error when parsing row, e.g. because of invalid characters
  • :not_found — HTTP 404 error when retrieving the data
  • :quoting — problem with quoting, e.g. missing or stray quote, unclosed quoted field
  • :whitespace — a quoted column has leading or trailing whitespace

The following types of warning can be reported:

  • :no_encoding — the Content-Type header returned in the HTTP request does not have a charset parameter
  • :encoding — the character set is not UTF-8
  • :no_content_type — file is being served without a Content-Type header
  • :excel — no Content-Type header and the file extension is .xls
  • :check_options — CSV file appears to contain only a single column
  • :inconsistent_values — inconsistent values in the same column. Reported if <90% of values seem to have same data type (either numeric or alphanumeric including punctuation)

answered Feb 11, 2014 at 17:55

ldodds's user avatar

ldoddsldodds

2492 silver badges4 bronze badges

1

The National Archives developed a CSV Schema Language and CSV Validator, software written in Java. It’s open source.

answered Aug 7, 2016 at 12:05

Milos's user avatar

MilosMilos

1923 silver badges11 bronze badges

To validate a CSV file I use the RAINBOW CSV extension in Visual Studio Code and also I open the CSV file in Excel.

answered Feb 15, 2018 at 16:18

mruanova's user avatar

mruanovamruanova

6,1036 gold badges37 silver badges55 bronze badges

There is a great way to validate your CSV file.I am referring to this article, where the whole process is explained in tiniest details.

The validation process has two steps: the first one is to post the file to the API. Once your file is accepted,the API returns a polling endpoint that contains the results of the validation process.10 MB limit per file.

answered Feb 5, 2020 at 23:45

monkrus's user avatar

monkrusmonkrus

1,42024 silver badges23 bronze badges

CSV Lint at csvlint.com (not .io :) is a service we’re building to solve this problem. It checks CSV files against user-defined validation rules / schemas cell by cell.

We spent a lot of time tweaking the UI to allow users to create complex validation rules / schemas easily that meet their business needs without a single line of code.

Our offline validation feature allows users to see the results in-realtime even when validating multiple large size (with millions+ rows) files, and most importantly it 100% protects user data privacy.

answered Jun 17, 2018 at 6:57

Joe's user avatar

JoeJoe

2791 gold badge4 silver badges15 bronze badges

1

I wrote an open source Python tool to simplify validation of such files available from http://pypi.python.org/pypi/cutplace/.

The basic idea is that you describe the data format in a structured interface specification using OpenOffice.org, Excel or plain CSV. This is done in a few minutes and legible enough to serve as documentation too. We use it to validate files with about 200.000 rows on a daily base.

You can validate a CSV file using the command line:

cutplace specification.csv data.csv

In case invalid data rows are found, the exit code is 1. If you need more control, you can write a little Python script that imports the cutplace module and adds a listener for validation events.

As example, here’s a specification that would validate the sample data you provided, filling the gaps of your short description by making a few assumptions. (I’m writing the specification in CSV to inline it in this post. In practice I prefer OpenOffice.org’s Calc and ODS because I can use more formating and make it easier to read and maintain.)

,"Interface: Show statistics"
,
,"Data format"
"D","Format","CSV"
"D","Item delimiter",";"
"D","Header","1"
"D","Encoding","ASCII"
,
,"Fields"
,"Name","Example","Empty","Length","Type","Rule"
"F","date","15-Mar-10",,,"RegEx","dd-[A-Z][a-z][a-z]-dd"
"F","id","231",,,"Integer","0:"
"F","shown","345",,,"Integer","0:"
,
,"Checks"
,"Description","Type","Rule"
"C","id per date must be unique","IsUnique","date, id"

Lines starting with «D» describe the basic data format. In this case it is a CSV file using «;» as delimiter with 1 header line in ASCII encoding.

Lines starting with «F» describe the various fields. For example,

,"Name","Example","Empty","Length","Type","Rule"
"F","id","231",,,"Integer","0:"

defines a mandatory field «id» of type Integer with a value of 0 or greater. To allow the field to be empty, specify an «X» in the «Empty» column:

,"Name","Example","Empty","Length","Type","Rule"
"F","id","231","X",,"Integer","0:"

Finally there is an optional section to contain more advances checks spawning the whole file, not only single rows. For example, if each date in your file must provide date for an id only once, you can state this using:

,"Description","Type","Rule"
"C","id per date must be unique","IsUnique","date, id"

Any row that starts with an empty column can contain any text you like and will not be processed during validation. This is useful for headings, comments and so on.

Возможно, вам также будет интересно:

  • Проверка smb conf на ошибки
  • Проверка crontab на ошибки
  • Проверка rss ленты на ошибки
  • Проверка ccd на ошибки
  • Проверка ram на ошибки windows 10

  • Понравилась статья? Поделить с друзьями:
    0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии