Version Control Git Tagline "Fast, scalable, distributed revision control system" Initial Release ApGitHub (mirror) The products fell into four general categories: For each product selected, we'll provide an introduction to what the product does so you can judge for yourself whether it fits your definition of “Data Version Control”. We narrowed the list to six products across four general categories. When researching for this blog entry we encountered a number of products with some relationship to “Data Version Control”. Let's parse through the offerings that could be “Data Version Control” and make sense of who is building what. who changed what and when? Branch/merge? Sharing changes with others? Do you mean a content-addressed version of the above with all the good distributed qualities that solution provides?Īs you can see “Data Version Control” quickly gets complicated. What do you mean when you say version control? Which parts of version control do you care about? Do you care about rollback? Diffs? Lineage, i.e. labeled data)? Data for visualizations and reports? Data for a software application? What do you mean by data? Do you mean data in files or data in tables? Do you mean unstructured data like images or text from web pages? Do you mean CSV tables or JSON blobs? Do you mean big data like time series log entries? Do you mean relational databases? If relational, do you care about schema or just data (or vice versa)? Do you mean data transformations, like exist in data pipelines? Do you have an application in mind? Data for machine learning (i.e. What do you mean by Data Version Control? This blog attempts to survey the space and give you a better picture of what's out there. logs, diffs, and merges) on large scale data. We have new technology to do true version control (ie. However, Google doesn't seem to think so. There are other tools that version control data. DVC is a tool to version code and data in machine learning pipelines. Data version control is only a small part of what DVC does. There is one name in the data version control space I'm truly jealous of: DVC, short for "Data Version Control".
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |