airflow

Airflow KubernetesPodOperator: pass securityContext parameter

不问归期 提交于 2019-12-11 10:38:23
问题 Anyone could give me an example on passing some parameters as "runAsNonRoot" when creating a pod through KubernetesPodOperator ? I've tried to dig through the documentation but it is not clear. 回答1: At current this does not appear to be supported in the operator. You can see that the KubePodOp has an init that makes a PodGenerator. It then adds all the volumes and mounts to it before generating. This does not at any point call the only method in which you could pass a SecurityContext add_init

Airflow connection type File (path)

纵饮孤独 提交于 2019-12-11 08:53:13
问题 Hello everyone I'm playing with Airflow, I'm reading this helpful tutorial. I'm asking help to understand better how Admin->Connection works regarding Conn Type: File (path). I suppose this type of connection is to have local filesystem folder accessible by my operator? 来源: https://stackoverflow.com/questions/50151610/airflow-connection-type-file-path

Set Airflow Env Vars at Runtime

*爱你&永不变心* 提交于 2019-12-11 08:45:37
问题 If I set env vars corresponding to airflow config settings after executing the airflow binary and at the same time DAG definitions are being loaded into memory, will this have the same effect as having set these same env vars at the OS level prior to having executed the binary? 回答1: I wasn't able to find any documentation on whether this would work as intended and figured that if I had to read through the source to figure this out then it's probably not a good idea to be doing it in the first

Error installing apache-airflow on windows 10 anaconda

断了今生、忘了曾经 提交于 2019-12-11 08:27:01
问题 I'm trying to install apache-airflow[all_dbs] package on my anaconda env (python 3.6) but I keep getting the same error when it tries to install psutil. I already tried reinstalling 2015 and 2017 microsoft build tools but it didn't fix the problem. This is the message that I'm getting when trying to install the package: Running setup.py install for psutil ... error Complete output from command c:\users\pedrodaumas\anaconda3\envs\env36\python.exe -u -c "import setuptools, tokenize;__file__='C:

Airflow webserver not starting in 1.10

戏子无情 提交于 2019-12-11 08:01:32
问题 Trying to migrate from Airflow 1.9 to Airflow 1.10. After some effort I was able to install the new version but could not load the web UI. When I try to start webserver It gives an error of file being busy. I have not started any process or operations which might be locking any file. airflow initdb works just fine. Error ubuntu@ubuntu-xenial:~/airflow$ airflow webserver [2018-09-06 18:46:19,916] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle

Airflow Scheduler not picking up DAG Runs

孤人 提交于 2019-12-11 07:16:28
问题 I'm setting up airflow such that webserver runs on one machine and scheduler runs on another. Both share the same MySQL metastore database. Both instances come up without any errors in the logs but the scheduler is not picking up any DAG Runs that are created by manually triggering the DAGs via the Web UI. The dag_run table in MysQL shows few entries, all in running state: mysql> select * from dag_run; +----+--------------------------------+----------------------------+---------+-------------

Reversed upstream/downstream relationships when generating multiple tasks in Airflow

て烟熏妆下的殇ゞ 提交于 2019-12-11 07:14:50
问题 The original code related to this question can be found here. I'm confused by up both bitshift operators and set_upstream / set_downstream methods are working within a task loop that I've defined in my DAG. When the main execution loop of the DAG is configured as follows: for uid in dash_workers.get_id_creds(): clear_tables.set_downstream(id_worker(uid)) or for uid in dash_workers.get_id_creds(): clear_tables >> id_worker(uid) The graph looks like this (the alpha-numeric sequence are the user

Running Airflow Tasks In Parallel - Nothing Gets Scheduled

放肆的年华 提交于 2019-12-11 07:08:18
问题 I just went through the process of configuring my Airflow setup to be capable of parallel processing by following this article and using this article. Everything seems to be working fine in the sense that I was able to run all of those commands from the articles without any errors, warnings, or exceptions. I was able to start up the airflow webserver and airflow scheduler and I'm able to go on the UI and view all my DAGs but now none of my DAGs are starting that previously were working. I had

Airflow 1.9 - Cannot get logs to write to s3

你说的曾经没有我的故事 提交于 2019-12-11 06:34:52
问题 I'm running airflow 1.9 in kubernetes in aws. I would like the logs to go to s3 as the airflow containers themselves are not long lived. I've read the various threads and documents which describe the process but I still cannot get it working. First a test that demonstrates to me that the s3 configuration and permissions are valid. This is run on one of our worker instances. Use airflow to write to an s3 file airflow@airflow-worker-847c66d478-lbcn2:~$ id uid=1000(airflow) gid=1000(airflow)

schedule_interval and other gotchas with SubDagOperator

为君一笑 提交于 2019-12-11 06:04:21
问题 Airflow documentation clearly states SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule is set to None or @once, the SubDAG will succeed without having done anything Although we must stick to the documenation, I've found they work without a hiccup even with schedule_interval set to None or @once . Here's my working example. My current understanding (I heard about Airflow only 2 weeks back) of SubDagOperator s (or subdag s) is Airflow treats a subdag as just another task