kaggle

Data seems to be missing in Bigquery SEC Filing Dataset

 ̄綄美尐妖づ 提交于 2020-07-23 06:16:19
问题 I was pleased recently to discover that Bigquery hosts a dataset of SEC filings. I am unable to find the actual text of the filings in the dataset however! This seems so obvious I must be missing something. As an example, the 2018 Microsoft 10-K filing on the SEC website itself can be seen to have the 10-K text in which Item 7 includes the phrase "Management’s Discussion and Analysis of Financial Condition and Results". I searched for this phrase in the Dataset. First, the following query

Setting environment variables in Google Colab

╄→гoц情女王★ 提交于 2020-06-10 03:37:27
问题 I'm trying to use the Kaggle CLI API, and in order to do that, instead of using kaggle.json for authentication, I'm using environment variables to set the credentials. !pip install --upgrade kaggle !export KAGGLE_USERNAME=abcdefgh !export KAGGLE_KEY=abcdefgh !export -p However, the printed list of env. variables doesn't contain the ones I set above. declare -x CLICOLOR="1" declare -x CLOUDSDK_CONFIG="/content/.config" declare -x COLAB_GPU="1" declare -x CUDA_PKG_VERSION="9-2=9.2.148-1"

Import CSV data into Google Sheets

两盒软妹~` 提交于 2020-05-30 08:13:10
问题 When trying to use the IMPORTDATA function for this file: https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset#players_20.csv An unexpected error occurs that says it is impossible to import data into the spreadsheet. Is there any other way that I can bring this data to my spreadsheet? This data would be very important to the work I'm doing. It would save me from almost 3 months of work to be able to type and copy everything and then filtering according to my need. It would

Not reading all rows while importing csv into pandas dataframe

点点圈 提交于 2020-05-29 02:40:32
问题 I am trying the kaggle challenge here, and unfortunately I am stuck at a very basic step. My limited python knowledge has to be blamed for this. I am trying to read the datasets into a pandas dataframe by executing following command: test = pd.DataFrame.from_csv("C:/Name/DataMining/hillary/data/output/emails.csv") The problem is that this file as you would find out has over 300,000 records, but I am reading only 7945, 21. print (test.shape) (7945, 21) Now I have double checked the file and I

Not reading all rows while importing csv into pandas dataframe

∥☆過路亽.° 提交于 2020-05-29 02:38:59
问题 I am trying the kaggle challenge here, and unfortunately I am stuck at a very basic step. My limited python knowledge has to be blamed for this. I am trying to read the datasets into a pandas dataframe by executing following command: test = pd.DataFrame.from_csv("C:/Name/DataMining/hillary/data/output/emails.csv") The problem is that this file as you would find out has over 300,000 records, but I am reading only 7945, 21. print (test.shape) (7945, 21) Now I have double checked the file and I

documentation for Kaggle API *within* python?

陌路散爱 提交于 2020-05-25 14:42:12
问题 I want to write a python script that downloads a public dataset from Kaggle.com. The Kaggle API is written in python, but almost all of the documentation and resources that I can find are on how to use the API in command line, and very little on how to use the kaggle library within python . Some users seem to know how to do this, see for example several answers to this question, but the hints are not enough to resolve my specific issue. Namely, I have a script that looks like this: from

How to convert the Ipython kernel on kaggle to pdf and download it?

梦想与她 提交于 2020-05-11 07:40:29
问题 I want to download all the simulation with code and respective output in a .pdf file. Is there any way that it can be possible? I'have tried downloading the Ipython notebook and opening it on my PC in jupyter notebook and then converting it to pdf. But I'm searching for the direct way to do it. 回答1: TL;DR As of now, downloading the jupyter notebook and then converting it to PDF is the quickest way. If you still wish to convert the notebook to PDF on kaggle itself, you can do it using command

《Deep Convolutional Network Cascade for Facial Point Detection》复现

梦想的初衷 提交于 2020-05-08 16:22:51
1.引言 锵锵锵,好久不见,我又肥来了,前一段时间上网找资料的时候偶然发现一篇关于人脸关键点检测的文章,应该说这篇论文是关键点检测的看山鼻祖,论文主页:http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm,一篇中文翻译的博客:基于DCNN的人脸特征点定位。我大概看了一遍发现这个论文的思路对我有很大的帮助,同时整体神经网络结构的搭建也不算太复杂,因此决定将论文复现一下看看效果,同时我对论文提出的网络也进行了一点细微的修改,但是中间有点事所以这个计划在进行了一半后就搁浅了,直到这几天才将后续的部分完成,让我们一起看一看实现的过程。 我的训练环境是使用Python3.6,Tensorflow—gpu,CUDA9.1,CUDNN7版本,每个网络进行1000epoch训练,最终训练效果如下图所示,红色点是网络预测的坐标点,蓝色点为数据集中给出的坐标点,该网络的预测效果相对来说还是可以的,但是在嘴角部分的预测还有一定差距。 2.网络结构 论文提出的网络整体思想是将网络分为两个模块,第一模块是通过适应openCV、faster rcnn或者训练的其他网络将原始图片裁剪出人脸部分用作第二模块关键点检测的数据,由于我使用的是Kaggle上提供的人脸关键点定位数据集,因此我没有使用第一模块

【CV中的Attention机制】语义分割中的scSE模块

故事扮演 提交于 2020-05-08 10:22:12
前言: 本文介绍了一个用于语义分割领域的attention模块scSE。scSE模块与之前介绍的BAM模块很类似,不过在这里scSE模块只在语义分割中进行应用和测试,对语义分割准确率带来的提升比较大。 提出scSE模块论文的全称是:《 Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks 》。这篇文章对SE模块进行了改进,提出了SE模块的三个变体cSE、sSE、scSE,并通过实验证明了了这样的模块可以增强有意义的特征,抑制无用特征。实验是基于两个医学上的数据集MALC Dataset和Visceral Dataset进行实验的。 语义分割模型大部分都是类似于U-Net这样的encoder-decoder的形式,先进行下采样,然后进行上采样到与原图一样的尺寸。其添加SE模块可以添加在每个卷积层之后,用于对feature map信息的提炼。具体方案如下图所示: 然后开始分别介绍由SE改进的三个模块,首先说明一下图例: cSE模块: 这个模块类似之前BAM模块里的Channel attention模块,通过观察这个图就很容易理解其实现方法,具体流程如下: 将feature map通过global average pooling方法从[C, H, W]变为[C, 1, 1

Python 泰坦尼克生存率预测(修改)

谁说胖子不能爱 提交于 2020-05-06 06:43:36
步骤: 一、提出问题 二、理解数据 1、采集数据 2、导入数据 3、查看数据信息 三、数据清洗 1、数据预处理 2、特征工程 四、构建模型 五、模型评估 六、方案实施 撰写报告 一、提出问题:什么样的人在此次事件中更易存活? 二、数据理解: 1、采集数据:从Kaggle泰坦尼克号项目页面下载数据: https://www.kaggle.com/c/titanic 本人是采用百度上来的数据集 网盘地址: https://pan.baidu.com/s/1BfRZdCz6Z1XR6aDXxiHmHA 提取码:jzb3 2、导入数据 # 导入处理数据包 import numpy as np import pandas as pd # 导入数据 # 训练数据集 train = pd.read_csv( " ./train.csv " ) # 测试数据集 test = pd.read_csv( " ./test.csv " ) # 这里要记住训练数据集有891条数据,方便后面从中拆分出测试数据集用于提交Kaggle结果 print ( ' 训练数据集: ' ,train.shape, ' 测试数据集: ' ,test.shape) 训练数据集: (891, 12) 测试数据集: (418, 11) rowNum_train= train.shape[0] rowNum_test = test