How to Upload Dataset to Google Colab
This article was published equally a part of the Information Science Blogathon
Introduction
become a dataset for every possible apply instance ranging from the entertainment manufacture, medical, e-commerce, and fifty-fifty astronomy. Its users practice on various datasets to exam out their skills in the field of Data Science and Machine learning.
The Kaggle datasets can have varying sizes. Some datasets can be as modest equally under 1MB and equally large as 100 GB. Also, some of the Deep learning practices require GPU back up that can boost the training time. Google Colab is a promising platform that can help beginners to test out their code in the cloud environment.
In this commodity, I will explain how to load the datasets direct from Kaggle to Google Colab notebooks.
Step 1: Select any dataset from Kaggle
The offset and foremost step is to cull your dataset from Kaggle. You can select datasets from competitions too. For this article, I am choosing two datasets: Ane random dataset and one from the agile contest.
Screenshot from The Consummate Pokemon Images Information Set up
Step 2: Download API Credentials
To download data from Kaggle, you need to authenticate with the Kaggle services. For this purpose, y'all need an API token. This token tin can exist easily generated from the profile department of your Kaggle account. Merely, navigate to your Kaggle profile and then,
Click the Account tab and then whorl downwardly to the API section (Screenshot from Kaggle contour)
A file named "kaggle.json" will be download which contains the username and the API central.
This is a ane-time step and you don't demand to generate the credentials every time you download the dataset.
Step three: Setup the Colab Notebook
Fire upward a Google Colab notebook and connect it to the cloud example (basically first the notebook interface). Then, upload the "kaggle.json" file that you just downloaded from Kaggle.
Screenshot from Colab interface
Now yous are all set to run the commands need to load the dataset. Follow along with these commands:
Note: Here we will run all the Linux and installation commands starting with "!". As Colab instances are Linux-based, you can run all the Linux commands in the code cells.
one. Install the Kaggle library
! pip install kaggle
2. Make a directory named ".kaggle"
iii. Copy the "kaggle.json" into this new directory
! cp kaggle.json ~/.kaggle/
4. Allocate the required permission for this file.
! chmod 600 ~/.kaggle/kaggle.json
The colab notebook is now ready to download datasets from Kaggle.
All the commands needed to set upwards the colab notebook
Step four: Download datasets
Kaggle host two types of datasets: Competitions and Datasets. The procedure to download whatever type remains the same with just minor changes.
Downloading Competitions dataset:
! kagglecompetitions download <proper name-of-competition> Here, the name of the competition is non the bold title displayed over the groundwork. It is the slug of the contest link followed subsequently the "/c/". Consider our instance link:
"https://www.kaggle.com/c/google-smartphone-decimeter-challenge"
"google-smartphone-decimeter-challenge" is the name of the contest to be passed in the Kaggle command. This will commencement downloading the data nether the allocated storage in the instance:
The output of the command (Notebook screenshot)
Downloading Datasets:
These datasets are not role of any competition. Yous tin download these datasets past:
! kaggle datasets download <name-of-dataset>
Here, the name of the dataset is the "user-name/dataset-name". You lot can just copy the trailing text after "www.kaggle.com/". Therefore, in our case,
"https://www.kaggle.com/arenagrenade/the-complete-pokemon-images-data-set"
It will exist: "arenagrenade/the-complete-pokemon-images-data-gear up"
In instance you get a dataset with a goose egg extension, you tin simply use the unzip command of Linux to extract the information:
! unzip <name-of-file>
Bonus Tips
Tip 1: Download Specific Files
Yous just saw how to download datasets from Kaggle in Google Colab. It is possible that you are only concerned about a specific file and want to download just that file. Then you tin apply the "-f" flag followed by proper name of the file. This will download only that specific file. The "-f" flag works for both competitions and datasets command.
Example:
! kaggle competitions download google-smartphone-decimeter-challenge -f baseline_locations_train.csv
The output of the command (Notebook screenshot)
You can cheque out Kaggle API official documentation for more features and commands.
Tip 2: Load Kaggle Credentials from Google Drive
In step three, you uploaded the "kaggle.json" when executing the notebook. All the files uploaded in the storage provided while running the notebook are not retained later on the termination of the notebook.
It means that you need to upload the JSON file every time the notebook is reloaded or restarted. To avoid this manual work,
i. Merely upload the "kaggle.json" to your Google Bulldoze. For simplicity, upload information technology in the root folder rather than whatsoever folder structure.
two. Adjacent, mountain the bulldoze to your notebook:
Steps to Mountain Google Drive
3. The initial control for installing the Kaggle library and creating a directory named ".kaggle" remains the same:
! pip install kaggle
!mkdir ~/.kaggle 4. Now, y'all need to copy the "kaggle.json" file from the mounted google bulldoze to the current instance storage. The Google drive is mounted under the "./content/drive/MyDrive" path. Merely run the copy control every bit used in Linux:
!cp /content/drive/MyDrive/kaggle.json ~/.kaggle/kaggle.json
Now you can easily use your Kaggle competitions and datasets command to download the datasets. This method has the added advantage of not uploading the credential file on every notebook re-run.
Downloading dataset after configuring API primal
Benefits of Using Google Colab
Google Colab is a great platform to do information scientific discipline questions. One of the major benefits of the Colab is the free GPU support. Data science aspirants, in the kickoff, are short of computation resource, and therefore using Google Colab solves their hardware problems. The Colab notebooks run on Linux instances and therefore, you can run all the usual Linux commands and interact with the kernel more easily.
The RAM and disk allotment are more than than enough for practice datasets just if your enquiry requires more than compute ability, you lot can opt for the paid program "Colab pro".
Screenshot from colab pro
About the Writer
Howdy, I am Kaustubh Gupta, a Python Developer capable of Web Scraping, Automations, Data Science, a bit of Backend Web Evolution with knowledge of CSS and Bootstrap, Android App Programmer in Python. I explore everything that tin employ Python. I am currently mastering Machine learning algorithms forth with real-world applications involving a mixture of all tech stacks.
If you have whatsoever doubts, queries, or potential opportunities, and so you can reach out to me via
i. Linkedin – in/kaustubh-gupta/
2. Twitter – @Kaustubh1828
iii. GitHub – kaustubhgupta
4. Medium – @kaustubhgupta1828
The media shown in this article are non endemic by Analytics Vidhya and are used at the Author'south discretion.
giordanomalley1969.blogspot.com
Source: https://www.analyticsvidhya.com/blog/2021/06/how-to-load-kaggle-datasets-directly-into-google-colab/
0 Response to "How to Upload Dataset to Google Colab"
Post a Comment