PyGotham 19: data privacy, feature flags, Django + ElasticSearch
Diogo de Miranda
October 3, 2019
<p>We are arriving in New York! Part of our team is on their way to <a href="https://2019.pygotham.org/">PyGotham 2019</a>, the biggest event of the Python community in New York. The experience last year was amazing, so we decided to come back. We are also sponsoring it this year, so if you are going to the event make sure to stop by our booth, we are bringing lots of cool swags and some Brazilian coffee!</p><p>We also will give 5 talks at the conference! Hope to see you there and please feel free to reach us anytime during the conference to ask questions. For more information, here are our talks' slides and outlines.</p><h3 id="pull-requests-merging-good-practices-into-your-project">Pull Requests: Merging Good Practices Into your Project</h3><ul><li>Speaker: <a href="https://twitter.com/lucabezerra_">Luca Bezerra</a></li><li>Link to Slides: <a href="https://drive.google.com/file/d/1qHMsrpsGzi-N9XhIKKTRXMmq2jiXIffc/view">Pull Requests: Merging Good Practices Into your Project</a></li></ul><p>Although known by most, Pull Requests are often not dealt with in the most effective way. Believe it or not, there are teams that don’t review code at all! People may assume that a senior developer is experienced enough to not make any mistakes, or that merely changing those 3 lines of code couldn’t possibly do any harm to the system. In these cases, it’s not uncommon to skip the code review in order to cut some time. Unreviewed (or badly reviewed) code can be extremely dangerous, resulting in huge risks and unpredictable behavior.</p><p>A survey says that, on average, developers spend 45% of their time fixing bugs and technical debt, when they could be developing new features instead. Defining simple guideline files, adopting certain behaviors and setting up repository configurations are steps that can increase manyfold the code review performance (in both time and quality). Using review tools both on server (e.g. Heroku Review Apps) and locally (e.g. linters) can also greatly increase the process’ speed. Creating templates and checklists ensures no step is overlooked or forgotten. The list goes on, but enough spoilers for now. The attendees will learn specific tips, tools, processes and recommended practices that were compiled from research and real-life use cases (both from my experience and from big players like Django, Facebook, Mozilla, etc), along with some survey data that demonstrates why reviewing code is important.</p><h3 id="jane-doe-will-help-you-improve-your-project">Jane Doe will help you improve your project</h3><ul><li>Speaker: <a href="https://twitter.com/_rebecasarai">Rebeca Sarai</a></li><li>Link to Slides: <a href="https://docs.google.com/presentation/d/1d1AEIg9_GLCL62E8Nkfcu4W5UyNXK8mR8ynDqcaomMo/edit?usp=sharing">Jane Doe will help you improve your project</a></li></ul><p>On days of privacy scandals, the concern about securing customers’ data is bigger than ever, and the solution is farther from locking everything in a safe box. Sharing data is inevitable, in this talk we will approach the data anonymity problem, exploring how to use anonymization techniques to secure users personal information when analyzing, testing, processing, or sharing a database.</p><p>Customers’ data is important. The number of privacy laws in recent years has grown from 20 to 100, to name a few: PCI compliance in the payment industry, the European GDPR regulation, and the Brazilian LGPD. All these new regulations attempt to bridge an old gap: data anonymity. How to handle data and protect the individuals comprised in it? Companies often face lawsuits to compensate for personal information breaches in their database.</p><p>Code must be tested. In classic development workflow, many times production data is copied onto test, QA or staging environments, only to be followed by exposure to the eyes of testers, receivers, or unauthorized developers on machines less protected than production environments. It is not seldom for files also to be shared with external partners, who often require but a small part of the data transferred, and granting access to user’s data might be a breach. If in one hand sharing data is both necessary and inevitable, on the other technologies that assure the privacy of individuals details are no longer only desirable, but essential.</p><p>A Jane Doe is a person without a name that is able to perform actions even though without any recollection of personal information. We will use this principle to approach two important areas in software development: how to streamline when testing complex systems and how to manage data whilst securing users’ personal information. We will create a boilerplate project to expose different techniques of anonymization and pseudonymization, showing that solving the anonymity problem is much more complex than replacing names, last names, and social security numbers - and all of that avoiding bottlenecking Django projects.</p><h3 id="taming-irreversibility-with-feature-flags">Taming Irreversibility with Feature Flags</h3><ul><li>Speaker: <a href="https://twitter.com/hugoabessa">Hugo Bessa</a></li><li>Link to Slides: <a href="https://docs.google.com/presentation/d/1O4UjUFL39CIKcAXvIz5SUO_Yi1uj8Csc7EAplp1T-4Y/edit?usp=drivesdk">Taming Irreversibility with Feature Flags</a></li></ul><p>It’s been 10 years since Flickr’s development team documented the use of Feature Flags in their software. Tech giants like Google and Facebook have also stated their use, yet weirdly enough there seems to be only but few around the community benefitting from feature flipping.</p><p>Flags make toggling whole features on and off without touching the code possible. This can help the development team not only by cutting down on response time to disasters but also by loading on peace of mind for developers. There are also great improvements on code sync frequency and in the launching flow of new features - especially in applications with a large sum of users.</p><p>Along with these great benefits, feature flags also raise some concerns: there are multiple strategies to implement them and numerous new things to worry about when developing new gated features. From the tools you can use to store and retrieve your flags to the way you can maintain your application’s consistency in edge cases scenarios.</p><p>This talk focuses on some of the benefits and challenges faced when using feature flags on team projects, and how to extract their best value without losing sight of code quality.</p><h3 id="building-effective-django-queries-with-expressions">Building effective Django queries with expressions</h3><ul><li>Speaker: <a href="https://twitter.com/vcfbarreiros">Vanessa Barreiros</a></li><li>Link to Slides: <a href="https://docs.google.com/presentation/d/1HhISje4VyaQcjElRDsrKkZpIMuJQy9uLbGhfkuVswDI/edit?usp=sharing">Building effective Django queries with expressions</a></li></ul><p>It’s known that ORMs are a powerful tool to manipulate databases with ease. In Django, there are a set of out-of-the-box abstractions to help perform queries and shape them through annotations, aggregations, order by, and so on, hence saving one’s time. A common solution to filtering when models grow larger over time is creating redundant fields; a better solution is using Django built-in resources called query expressions.</p><p>Query expressions are smart yet straightforward functions that one can use to compute values on query execution and do string manipulation, calculations, among others, thus removing the burden of having unnecessary extra columns in our database. Using query expressions effectively can help to generate performant queries, avoiding potential inconsistencies and separating concerns.</p><p>This talk focuses on further optimizing Django queries by walking through code comparisons and examples with a dataset, diving into subjects such as custom database functions, conditional expressions, and filtering so to answer questions about the data.</p><h3 id="django-elasticsearch-without-invalidation-logic">Django + ElasticSearch without invalidation logic</h3><ul><li>Speaker: <a href="https://twitter.com/flaviojuvenal">Flavio Juvenal</a></li><li>Link to Slides: <a href="https://docs.google.com/presentation/d/1ZA6j4qgm1xKJ60MgHhk4ghejctOrE0Z59NIWHgQ5LoE/edit?usp=sharing">Django + ElasticSearch without invalidation logic</a></li></ul><p>This talk will teach you to finally integrate Django and Elasticsearch “like it’s 2019”.</p><p>Elasticsearch is a great addition to the Django developer’s toolkit: it supports performant complex full-text queries and filters on huge datasets, where traditional relational database-only solutions fall short. But integrating Django with Elasticsearch usually is a pain: you need logic to keep database tables and Elasticsearch indexes in sync. Since data is stored in two places, it can become out-of-sync if care is not taken. Dirty index data will generate wrong search results, defeating the purpose of the integration.</p><p>A new alternative is django- zombodb, a Django app that uses a Postgres extension for syncing tables with Elasticsearch indexes at transaction time. With django-zombodb, developers can treat an ElasticSearch index just like an internal Postgres index. This means no code is needed to synchronize Postgres with Elasticsearch, you just need to run a Django migration that executes a CREATE INDEX in the database and you’re done. Any new inserts, updates or deletes on that model will reflect on an Elasticsearch index at transaction time!</p><p>django-zombodb also offers a Pythonic/Djangonic API to make Elasticsearch queries over Django models using the ORM in a queryset-friendly way. Developers are able to compose Elasticsearch queries with regular ORM queries by just chaining queryset methods and composing Q-like objects. In this talk, you’ll learn django-zombodb advantages over other solutions, how it works, how to use it, and even you can contribute to it.</p>