导入seaborn的数据集方法load_datasets的问题

发布时间：2023-02-02 15:30

导入seaborn自带的一些经典数据集，如iris等，若是直接使用load_dataset方法，经常会发生URLError: 。这不用说是因为无法访问外网的原因，那么看load_datasett方法我们可以发现，有以下的注释：

def load_dataset(name, cache=True, data_home=None, **kws):
    """Load a dataset from the online repository (requires internet).

    Parameters
    ----------
    name : str
        Name of the dataset (`name`.csv on
        https://github.com/mwaskom/seaborn-data).  You can obtain list of
        available datasets using :func:`get_dataset_names`
    cache : boolean, optional
        If True, then cache data locally and use the cache on subsequent calls
    data_home : string, optional
        The directory in which to cache data. By default, uses ~/seaborn-data/
    kws : dict, optional
        Passed to pandas.read_csv

    """

从这里可以发现，这个数据集其实可以在github上下载，网址如下：
https://github.com/mwaskom/seaborn-data
然后我们又发现data-home这个属性：
data_home : string, optional
The directory in which to cache data. By default, uses ~/seaborn-data/
这个属性说的是，缓冲的数据放在哪。
~/seaborn-data/这个目录在linux一般指的是home目录，而windows对应的应该是我们的user目录，我在我的user下建立了一个seaborn-data，同时把github上的数据集下载下来，放入到seaborn-data里，如下：
便可以正确的导入数据集了：

导入seaborn的数据集方法load_datasets的问题

相关推荐