What’s your take on parquet?

I’m still reading into it. Why is it closely related to apache? Does inly apache push it? Meaning, if apache drops it, there’d be no interest from others to push it further?

It’s published under apache hadoop license. It is a permissive license. Is there a drawback to the license?

Do you use it? When?

I assume for sharing small data, csv is sufficient. Also, I assume csv is more accessible than parquet.

  • Dunstabzugshaubitze@feddit.org
    link
    fedilink
    arrow-up
    11
    ·
    14 days ago

    parquet is cloesely tied to the apache foundation, because it was designed as a storage format for hadoop.

    But many data processing libraries offer interfaces to handle parquet files so you can use it outside of the hadoop eco system.

    It’s really good for archiving data, because the format can store a lot of data with relatively low disk space, while still providing ok read performance because often times you won’t need to read the whole file due to how they are structured, where csv files would be a lot of plaintext taking up more diskspace.