Converting CSV file to LIBSVM compatible data file using python

前端 未结 2 463
春和景丽
春和景丽 2020-12-31 23:20

I am doing a project using libsvm and I am preparing my data to use the lib. How can I convert CSV file to LIBSVM compatible data?

CSV File: https:

相关标签:
2条回答
  • 2020-12-31 23:47

    csv2libsvm.py does not work with Python3, and also it does not support label targets (string targets), I have slightly modified it. Now It should work with Python3 as well as label targets. I am very new to Python, so my code maybe not best practice, but I hope can help to someone.

    #!/usr/bin/env python
    
    """
    Convert CSV file to libsvm format. Works only with numeric variables.
    Put -1 as label index (argv[3]) if there are no labels in your file.
    Expecting no headers. If present, headers can be skipped with argv[4] == 1.
    
    """
    
    import sys
    import csv
    import operator
    from collections import defaultdict
    
    def construct_line(label, line, labels_dict):
        new_line = []
        if label.isnumeric():
            if float(label) == 0.0:
                label = "0"
        else:
            if label in labels_dict:
                new_line.append(labels_dict.get(label))
            else:
                label_id = str(len(labels_dict))
                labels_dict[label] = label_id
                new_line.append(label_id)
    
        for i, item in enumerate(line):
            if item == '' or float(item) == 0.0:
                continue
            elif item=='NaN':
                item="0.0"
            new_item = "%s:%s" % (i + 1, item)
            new_line.append(new_item)
        new_line = " ".join(new_line)
        new_line += "\n"
        return new_line
    
    # ---
    
    input_file = sys.argv[1]
    try:
        output_file = sys.argv[2]
    except IndexError:
        output_file = input_file+".out"
    
    
    try:
        label_index = int( sys.argv[3] )
    except IndexError:
        label_index = 0
    
    try:
        skip_headers = sys.argv[4]
    except IndexError:
        skip_headers = 0
    
    i = open(input_file, 'rt')
    o = open(output_file, 'wb')
    
    reader = csv.reader(i)
    
    if skip_headers:
        headers = reader.__next__()
    
    labels_dict = {}
    for line in reader:
        if label_index == -1:
            label = '1'
        else:
            label = line.pop(label_index)
    
        new_line = construct_line(label, line, labels_dict)
        o.write(new_line.encode('utf-8'))
    
    0 讨论(0)
  • 2020-12-31 23:55

    You can use csv2libsvm.py to convert csv to libsvm data

    python csv2libsvm.py iris.csv libsvm.data 4 True
    

    where 4 means target index, and True means csv has a header.

    Finally, you can get libsvm.data as

    0 1:5.1 2:3.5 3:1.4 4:0.2
    0 1:4.9 2:3.0 3:1.4 4:0.2
    0 1:4.7 2:3.2 3:1.3 4:0.2
    0 1:4.6 2:3.1 3:1.5 4:0.2
    ...
    

    from iris.csv

    150,4,setosa,versicolor,virginica
    5.1,3.5,1.4,0.2,0
    4.9,3.0,1.4,0.2,0
    4.7,3.2,1.3,0.2,0
    4.6,3.1,1.5,0.2,0
    ...
    
    0 讨论(0)
提交回复
热议问题